The notation, equations, and symbols were derived from https://theclevermachine.wordpress.com/2014/09/06/derivation-error-backpropagation-gradient-descent-for-neural-networks/. These equations were derived in our last session.
$z_j$: input to node $j$ for layer $l$
$g_j$: activation function for node $j$ in layer $l$ (applied to $z_j$)
$a_j = g_j(z_j)$: output/activation of node $j$ in layer $l$
$w_{i,j}$: weights connecting node $i$ in layer $(l-1)$ to node $j$ in layer $l$
$t_k$: target value for node $k$ in the output layer
$\delta_k: (a_k-t_k)g_k'(z_k)$
$\frac{\partial{E}}{\partial{w_{j,k}}}$: $\delta_ka_j$
This notation was really confusing so I had to break it down line by line. When I am looking at the output layer, then my $a_k$ is the output of the node of the output layer and $a_j$ is the output from my hidden layer. $t_k$ is my target variable and $g_k'(z_k) = a_k'$ the derivative of the output layer. The change variable is what I will use for gradient descent.
$\delta_j = g_j'(z_j)\sum_k^K \delta_kw_{j,k}$
The $j$ represents the hidden layer that the weight is originating from and $k$ represents the output layer, where the weight is pointing to. In the case of a 1 hidden layer model, $j$ is the hidden layer and $k$ is the output layer. You want to sum over all of the output nodes that this weight could point to because when you do backprop, all of these nodes are affected by a change in this weight. $\delta_k$ is the delta for the kth node.
In [22]:
class NN_backprop:
def __init__(self, n_input, n_hidden,n_nodes):
self.n_input = n_input + 1
self.n_layers = n_hidden +2
self.n_nodes = n_nodes
self.a_layers = np.ones(self.n_layers, self.n_nodes)
self.z_output = np.ones((self.n_layers, self.n_nodes))
self.a_output = np.ones((self.n_layers, self.n_nodes))
self.delta = np.ones((self.n_layers, self.n_nodes))
self.w = np.random.rand((self.n_layers, self.n_nodes, self.n_nodes))
# stores the gradients to update it
self.c_ = np.random.rand((self.n_layers, self.n_nodes, self.n_nodes))
def runNN (self, inputs):
self.a_input = inputs
self.layers[0] = inputs
# basically doing sigmoid(w^1x+w^1_0) to find output for each hidden layer
for h in range(1,self.layers):
for z_ in range(self.n_nodes):
self.z_output[h][z_] = np.sum(np.dot(self.a_output[h-1],self.w[h,z_,]))
self.a_hidden[h][z_] = self.sigmoid(self.z_output[h][z_])
return self.a_output
def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))
def derivative_sigmoid(x):
return self.sigmoid(x)*(1-self.sigmoid(x))
def backPropagate (self, targets, eta):
# https://theclevermachine.wordpress.com/2014/09/06/
#derivation-error-backpropagation-gradient-descent-for-neural-networks/
### dE/dW_jk
output_deltas = np.zeros(self.n_nodes)
self.delta[-1] = (self.a_output[-1]-targets)*derivative_sigmoid(self.z_output[-1])
self.c_[-1,:,:] = self.delta[-1]*self.a_output[-2]
self.w[-1,:,:] -= eta*self.c_[-1,:,:]
#output delta should be a k x 1 array and self.a_hidden is a 1 x n_hidden
# so c_output should be a n_output x n_hidden or something along these lines
# these hodl the gradient changes for the weights that go from hidden to ouput
for h in xrange(self.n_layers,-1):
for i in xrange(self.n_nodes):
for j in xrange(self.n_nodes):
for k in xrange(self.n_nodes):
self.delta[h][j] += derivative_sigmoid(self.z_output[h][j])*self.delta[h][k]*self.w[h,j,k]
# these hold the gradient changes for the weights that go from input to hidden
self.c_input[h,i,j] = self.delta[h][j]*self.a_output[h][i]
#update the weights
self.w[h,i,j] -= eta*self.c_input[h,i,j]
In [25]:
pat = [
[[0,0], [1]],
[[0,1], [1]],
[[1,0], [1]],
[[1,1], [0]]
]
myNN = NN_backprop( 2, 2, 1)
inputs = pat[0]
targets = pat[1]
myNN.runNN(pat)
# myNN.backPropagate(targets, .1)
In [32]:
np.zeros((3,5,4))[-1,:,:]
Out[32]:
In [30]:
[1,2,3,4,5][-2]
Out[30]:
In [ ]: